
Reputation
Badges 1
33 × Eureka!Roughly I am trying to do this:
def upload_clearml_dataset_from_external_source(
source_url,
dataset_name: str,
dataset_project: str,
):
# reference:
dataset = Dataset.create(dataset_name=dataset_name, dataset_project=dataset_project)
dataset.add_external_files(source_url=source_url)
dataset.upload()
dataset.finalize()
upload_clearml_dataset_from_external_source("
", name, project)
Dataset.get(dataset_project=project, dataset...
ignore the indentation, didn't align when I copy over
Any followup on this question? Recap:
Task,add_requirements() doesn't seem to do install the package from my experiment
Additionally, as alternative of add_requirements() if I can't get it working, is there an example of using docker bash init script
you can point me to
Executing task id [7605f1e5ce6b45e99e9302d93bc3bac6]:
repository = git@xxx
branch = xxx
version_num = 9dca88fa23ff93d446eb2ff7d615d7ade213c8aa
tag =
docker_cmd = iocr.io/xxx
entry_point = clearml_init.py
working_dir = dev
Based on the logging,working_dir = dev
is the problem. I need to have a way to overwrite the working_dir.
Hi @<1523701070390366208:profile|CostlyOstrich36> , I tried out Task.add_requirements way to add packages, but it doesn't seem to be working as I expected. here is the snippet i used to setup this up:
Task.force_store_standalone_script()
add_packages = ["fastparquet"]
for pkg in add_packages:
Task.add_requirements(pkg)
task = Task.init(project_name=project_name, task_name=task_name)
task.set_base_docker(docker_arguments="--env CLEARML_AGENT_SKIP_PYTHON_ENV_INSTA...
Confirmed that without task.set_repo, it come down to the same error:ModuleNotFoundError: No module named 'src'
Hi John, the dataset.squash doc says "If a set of versions are given it will squash the versions diff into a single version", I want to double check will it only keep the latest version?
Because I don't want any old version stuff even old version have more stuff than the latest version
PROJECT_NAME = "test"
TASK_NAME = "test_connect"
QUEUE_NAME = "default"
task = Task.init(project_name=PROJECT_NAME, task_name=TASK_NAME)
config = {
"name": "foo",
"arg1": "bar",
}
task.connect(config)
task.execute_remotely(queue_name=QUEUE_NAME)
# ------------- end of setup -------------
def dummy_op(config):
pprint(config)
return config
dummy_op(config)
Sreenshot also provided to show what "edit" button only appear in user property not hyperparamter

print("Model.id:", m.id) # <- this is returning model_id, but I need project_id or project_name
print("Model.data:", m.data) # <- AttributeError: 'Model' object has no attribute 'data'
Aha I see. that works to retrieve the project id.
A side question, I notice in the clearml design, Task, Model, Dataset those object all have its base model, but not project concept. In terms of project, is there a quick way to retrieve project name based on its id?
install in dev mode is the easiest without having to publish it first
Sure, essentially my local python project organized using "src layout", look like this:
foo/
|--src/
|--module.py
|--pyproject.toml
|--clearml_tasks/
|--task1.py
in the project, it would use absolute import like from foo import module
, and I would install foo project in a editable mode during setup.
When I trying to create clearml task and send it to remote server using above way (leverage requirements.txt to configure library dependencies, and pro...
I gave this it a try to switch from Task.create() to Task.init(). I think I am pretty close to switch to using init(). But still have issue of ModuleNotFoundError: No module named 'src'
when using task.init().
My project setup look like this:
project_root/
|--src/
|--runbooks/
|--run_task.py
So if I use Task.create(repo=xx, script="runbooks/run_task.py"), it works but if I switch to using Task.init() with the same repo setup (task.set_repo, and then follow by ...
Would it cause problem to manually set repo when using task.init()?
I can try taking it out see if it fix the issue. But i feel it is not the root cause
Thanks for the tips! I will take them for a spin
And how would I do that? "check apiserver logs"
Actually I am not sure. for enterprise users. Is it most commonly self-hosted?
I see. Is there anything I can check from the user perspective?
that is just to compare the same functionality used in task.create() can be achieved when using task.init workflow. Not technically required
task = Task.init(
project_name=PROJECT_NAME,
task_name=TASK_NAME,
task_type=Task.TaskTypes.data_processing,
)
task.set_repo(
repo="git@xxx.git",
branch=branch
)
task.set_base_docker(
docker_image="docker-image",
)
task.execute_remotely(queue_name=QUEUE_NAME)
This is how I use the task init
I used these setup to load a pretty big dataset from s3:
dataset.add_external_files(
source_url
)
dataset.upload(
verbose=verbose
)
dataset.finalize()
But then seeing error complain about dataset doesn't exist. So my best guess is that the uploading is still happening in the background while the code has move forward to try to do something with that dataset.
So I am questioning if I have to explicitly add some logic to wait f...